AITopics | massive data

Collaborating Authors

massive data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

Neural Information Processing SystemsSep-30-2025, 12:11:05 GMT

Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable, representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol GreeDI, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference on tens of millions of examples using Hadoop.

identifying representative element, name change, submodular maximization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

Fast Distributed k-Center Clustering with Outliers on Massive Data

Neural Information Processing SystemsAug-12-2025, 22:51:42 GMT

Clustering large data is a fundamental problem with a vast number of applications. Due to the increasing size of data, practitioners interested in clustering have turned to distributed computation methods. In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers. In the noise-free setting we demonstrate how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance. Additionally, in the noisy setting, we develop a novel distributed algorithm that is also an O(1)-approximation. These algorithms are highly parallel and lend themselves to virtually any distributed computing framework. We compare both empirically against the best known noisy sequential clustering methods and show that both distributed algorithms are consistently close to their sequential versions. The algorithms are all one can hope for in distributed settings: they are fast, memory efficient and they match their sequential counterparts.

algorithm, k-center clustering, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Positive region preserved random sampling: an efficient feature selection method for massive data

Bai, Hexiang, Li, Deyu, Liang, Jiye, Zhai, Yanhui

arXiv.org Artificial IntelligenceJul-4-2025

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve the discriminatory ability of all the features of the target massive data set within an acceptable time on a personal computer. Second, the lower boundary of the probability of the object pairs that can be discerned using the feature subset selected in all object pairs that should be distinguished can be estimated before finding reducts. Furthermore, 11 data sets of different sizes were used to validate the proposed method. The results show that approximate reducts can be found in a very short period of time, and the discriminatory ability of the final reduct is larger than the estimated lower boundary. Experiments on four large-scale data sets also showed that an approximate reduct with high discriminatory ability can be obtained in reasonable time on a personal computer.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.01998

Country:

Asia > China > Shanxi Province (0.14)
Europe > Poland > Masovia Province > Warsaw (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Distributed Submodular Cover: Succinctly Summarizing Massive Data

Neural Information Processing SystemsJan-14-2025, 22:24:23 GMT

How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condition preva- lent in many data summarization applications. The classical greedy algorithm is known to provide solutions with logarithmic approximation guarantees compared to the optimum solution. However, this sequential, centralized approach is imprac- tical for truly large-scale problems. In this work, we develop the first distributed algorithm – DISCOVER – for submodular set cover that is easily implementable using MapReduce-style computations.

massive data, submodular cover, succinctly, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.82)

Add feedback

Datalike: Interview with Angelique Yameogo

AIHubMar-7-2024, 10:54:35 GMT

Angelique Yameogo is studying for a PhD at the University of South Brittany in France. Her thesis is focused on fake news analysis using data science techniques. She has worked with several companies in Burkina Faso as an artificial intelligence engineer and mobile developer. She is skilled in HTML, CSS, JavaScript, pandas, sci-kit-learn, NLTK and others. Through networking, you can also access hidden opportunities and keep abreast of trends and developments in your field.

angelique yameogo, data science technique, information, (13 more...)

AIHub

Country:

Europe > France (0.25)
Africa > Burkina Faso (0.25)

Industry: Education > Educational Setting > Higher Education (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cloud Software Engineer 3

#artificialintelligenceNov-7-2022, 11:15:57 GMT

A Bachelor's Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to two (2) years of experience A Master's degree in a Technical Field will be considered equivalent to four (4) years of experience A degree in Mathematics, Information Systems, Engineering, or similar degree will be considered as a technical field Eight (8) years of experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution and at least six (6) years of experience developing software with high level languages such as Java, C, C Demonstrated ability to work with OpenSource (NoSQL) products that support highly distributed, massively parallel computation needs such as Hbase, Acumulo, Big Table, etcAd: Ready to find your dream job? Use this free career assessment test to figure it out. Peraton drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy As the world's leading mission capability integrator and transformative enterprise IT provider, we deliver trusted and highly differentiated national security solutions and technologies that keep people safe and secureAd: Stop spending hours editing your resume to fit job descriptions. Peraton serves as a valued partner to essential government agencies across the intelligence, space, cyber, defense, civilian, health, and state and local markets Every day, our employees do the can't be done, solving the most daunting challenges facing our customers For Colorado Residents: Colorado Salary Minimum: $90,500 Colorado Salary Maximum: $219,700 The estimate displayed represents the typical salary range for this position, and is just one component of Peraton's total compensation package for employees Other rewards may include annual bonuses, short- and long-term incentives, and program-specific awards In addition, Peraton provides a variety of benefits to employees

cloud software engineer 3, demonstrated work experience, years, (8 more...)

#artificialintelligence

Country: North America > United States > Colorado (0.69)

Industry:

Government (0.78)
Information Technology > Security & Privacy (0.57)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence (0.76)
(2 more...)

Add feedback

To evolve, AI must face its limitations

#artificialintelligenceJun-4-2022, 22:30:17 GMT

From medical imaging and language translation to facial recognition and self-driving cars, examples of artificial intelligence (AI) are everywhere. And let's face it: although not perfect, AI's capabilities are pretty impressive. Even something as seemingly simple and routine as a Google search represents one of AI's most successful examples, capable of searching vastly more information at a vastly greater rate than humanly possible and consistently providing results that are (at least most of the time) exactly what you were looking for. The problem with all of these AI examples, though, is that the artificial intelligence on display is not really all that intelligent. While today's AI can do some extraordinary things, the functionality underlying its accomplishments works by analyzing massive data sets and looking for patterns and correlations without understanding the data it is processing. As a result, an AI system relying on today's AI algorithms and requiring thousands of tagged samples only gives the appearance of intelligence.

ai application, application, intelligence, (13 more...)

#artificialintelligence

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (0.55)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

To evolve, AI must face its limitations

#artificialintelligenceMay-28-2022, 15:00:08 GMT

ai application, application, intelligence, (15 more...)

#artificialintelligence

Industry: Information Technology (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

CVAT Annotation

#artificialintelligenceNov-30-2021, 05:15:33 GMT

Machine learning model structuring and processing is not as easy as it may sound. Without the availability of required data, it is difficult to imagine the accuracy of results. At the core of several AI programs wherein complex computations are done, machine learning algorithms also enable the systematic rendering of learning tasks. As much as the quality of data is central to an algorithm, following the stages of applying the data for performance decides the accuracy of prediction. Whether the data is limited or available in ample amounts, imagining data annotation manually isn't a practical solution when business demands are changing rapidly.

annotation, cvat, installation, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

WHAT ARE THE BIGGEST CHALLENGES IN ARTIFICIAL INTELLIGENCE AND HOW TO SOLVE THEM?

#artificialintelligenceOct-13-2021, 15:50:48 GMT

Artificial intelligence (AI) is set to change how the world works. Although it's not perfect, artificial intelligence is a gamer changer. AI is the main engine of the digital revolution. The COVID-19 crisis has accelerated the need for human-machine digital intelligent platforms facilitating new knowledge, competences and workforce skills, advanced cognitive, scientific, technological, and engineering, social, and emotional skills. In the AI and Robotics era, there is a high demand for the scientific knowledge, digital competence, and high-technology training in a range of innovative areas of exponential technologies, such as artificial intelligence, machine learning and robotics, data science and big data, cloud and edge computing, the Internet of Thing, 5G, cybersecurity and digital reality.

algorithm, artificial intelligence, biggest challenge, (7 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (0.78)
Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Artificial Intelligence > Natural Language (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.37)

Add feedback